Search CORE

43 research outputs found

Towards accurate prediction for high-dimensional and highly-variable cloud workloads with deep learning

Author: Chen Z
El-Ghazawi T
Hu J
Min G
Zomaya AY
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/11/2019
Field of study

This is the author accepted manuscript. The final version is available from IEEE via the DOI in this recordResource provisioning for cloud computing necessitates the adaptive and accurate prediction of cloud workloads. However, the existing methods cannot effectively predict the high-dimensional and highly-variable cloud workloads. This results in resource wasting and inability to satisfy service level agreements (SLAs). Since recurrent neural network (RNN) is naturally suitable for sequential data analysis, it has been recently used to tackle the problem of workload prediction. However, RNN often performs poorly on learning longterm memory dependencies, and thus cannot make the accurate prediction of workloads. To address these important challenges, we propose a deep Learning based Prediction Algorithm for cloud Workloads (L-PAW). First, a top-sparse auto-encoder (TSA) is designed to effectively extract the essential representations of workloads from the original high-dimensional workload data. Next, we integrate TSA and gated recurrent unit (GRU) block into RNN to achieve the adaptive and accurate prediction for highly-variable workloads. Using realworld workload traces from Google and Alibaba cloud data centers and the DUX-based cluster, extensive experiments are conducted to demonstrate the effectiveness and adaptability of the L-PAW for different types of workloads with various prediction lengths. Moreover, the performance results show that the L-PAW achieves superior prediction accuracy compared to the classic RNN-based and other workload prediction methods for high-dimensional and highly-variable real-world cloud workloads

Open Research Exeter

Improving the scalability of parallel N-body applications with an event driven constraint based execution model

Author: Aarseth SJ
Alfieri RA
Bonachea D
Chandra R
Dekate C
El-Ghazawi T
Hewitt C
Kale L
Message Passing Interface Forum
O’Shea BW
Salmon JK
Singh JP
Publication venue: 'SAGE Publications'
Publication date: 23/09/2011
Field of study

The scalability and efficiency of graph applications are significantly constrained by conventional systems and their supporting programming models. Technology trends like multicore, manycore, and heterogeneous system architectures are introducing further challenges and possibilities for emerging application domains such as graph applications. This paper explores the space of effective parallel execution of ephemeral graphs that are dynamically generated using the Barnes-Hut algorithm to exemplify dynamic workloads. The workloads are expressed using the semantics of an Exascale computing execution model called ParalleX. For comparison, results using conventional execution model semantics are also presented. We find improved load balancing during runtime and automatic parallelism discovery improving efficiency using the advanced semantics for Exascale computing.Comment: 11 figure

arXiv.org e-Print Archive

Crossref

Adaptive and Efficient Resource Allocation in Cloud Datacenters Using Actor-Critic Deep Reinforcement Learning

Author: Chen Z
El-Ghazawi T
Hu J
Luo C
Min G
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/11/2021
Field of study

This is the author accepted manuscript. The final version is available from IEEE via the DOI in this recordThe ever-expanding scale of cloud datacenters necessitates automated resource provisioning to best meet the requirements of low latency and high energy-efficiency. However, due to the dynamic system states and various user demands, efficient resource allocation in cloud faces huge challenges. Most of the existing solutions for cloud resource allocation cannot effectively handle the dynamic cloud environments because they depend on the prior knowledge of a cloud system, which may lead to excessive energy consumption and degraded Quality-of-Service (QoS). To address this problem, we propose an adaptive and efficient cloud resource allocation scheme based on Actor-Critic Deep Reinforcement Learning (DRL). First, the actor parameterizes the policy (allocating resources) and chooses actions (scheduling jobs) based on the scores assessed by the critic (evaluating actions). Next, the resource allocation policy is updated by using gradient ascent while the variance of policy gradient is reduced with an advantage function, which improves the training efficiency of the proposed method. We conduct extensive simulation experiments using real-world data from Google cloud datacenters. The results show that our method can obtain the superior QoS in terms of latency and job dismissing rate with enhanced energy-efficiency, compared to two advanced DRL-based and five classic cloud resource allocation methods.European Union Horizon 202

Open Research Exeter

The Promise of High-Performance Reconfigurable Computing

Author: D. Buell
E. El-Araby
K. Gaj
Miaoqing Huang
T. El-Ghazawi
V. Kindratenko
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

A Case Study in Tightly Coupled Multi-paradigm Parallel Programming

Author: A. Ferrari
A. Gursoy
A.L. Lastovetsky
C.-C. Chiang
G. Zheng
J. Leichtl
J. Nieplocha
L.V. Kale
R. Abedi
S.-E. Choi
T. El-Ghazawi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Crossref

Software engineering techniques for the development of systems of systems

Author: C. Keating
C.M. Bishop
D.N. Jansen
E.M. Clarke
F. Arbab
G. Goth
J.O. Kephart
K. Vanthournout
M. Hannebauer
M. Kwiatkowska
M. Kwiatkowska
M. Kwiatkowska
M. Kwiatkowska
M.W. Maier
O. Zimmermann
R. Allen
R. Calinescu
R. Calinescu
R. Calinescu
R. Calinescu
R. Garcia
S.R. White
T. El-Ghazawi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

This paper investigates how existing software engineering techniques can be employed, adapted and integrated for the development of systems of systems. Starting from existing system-of-systems (SoS) studies, we identify computing paradigms and techniques that have the potential to help address the challenges associated with SoS development, and propose an SoS development framework that combines these techniques in a novel way. This framework addresses the development of a class of IT systems of systems characterised by high variability in the types of interactions between their component systems, and by relatively small numbers of such interactions. We describe how the framework supports the dynamic, automated generation of the system interfaces required to achieve these interactions, and present a case study illustrating the development of a data-centre SoS using the new framework

CiteSeerX

Crossref

Aston Publications Explorer

DEF - a programming language agnostic framework and execution environment for the parallel execution of library routines

Author: A Grama
A Raveendran
AH Karp
C Saravanakumar
D Agarwal
D Puthal
G Sharma
G Von Laszewski
I Foster
J Dongarra
J Ekanayake
J Wang
L Dagum
M Armbrust
M Descher
M Kazim
MD McCool
P Barham
RT Fielding
S Seely
S Vinoski
T El-Ghazawi
T Ludescher
T Ludescher
TG Mattson
U Arjun
Z Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Parallel and distributed computing for data mining

Author: A.Y. Zomaya
O. Frieder
T. El-Ghazawi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref